Skip to content

bote795/recommendMeSenpai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

recommendMeSenpai

##Website

##PreProcessing Main Files:

  • anime.json

  • contains all the anime/Manga from animelist at the time we obtained data

  • 42k records of anime/manga

  • users.json.zip

    • contains users and what they rated different animes/Mangas
    • 200 million records

    To be able to obtain our data:

    • anime-min.json
      • a filtered version of anime.json containing only anime
    • labels-min.json
      • runing all pictures from anime-min.json through i2v and retrieved labels for each pic
    • labels2-min.json
      • runing all pictures from test-2.json through i2v and retrieved labels for each pic
    • test-2.json
      • crawled extra pics
    • users-200.min.json
      • extract 200 records from users.json.zip
    • users-1000.min.json
      • extract 1000 records from users.json.zip

###How to generate our data #####******* NOTE: All directory paths must be change to fit your setup *******

First run the preProcessData.py

python preProcessData.py 

that will create anime-min.json

In order to execute the retrievePicTags.py You must first clone i2v
install dependencies

pip install scikit-image
pip install numpy
pip install scipy
pip install Pillow
pip install chainer

and download the following files from website:

  • tag_list.json
  • illust2vec_tag_ver200.caffemodel You will then have to move "retrievePicTag.py" into the inside of the cloned Repo Runing retrievePicTag.py will generate labels-min.json
python retrievePicTag.py

The command above will take different amounts of time depending on your CPU:

  • i5: 5min per picture
  • i7: 500ms per picture

We must then retireve all the extra pictures we downloaded using scrapy #####install

pip install scrapy 

#####Execute This will take about ~8 hrs

cd myAnimeList 
scrapy crawl myAnimeList --set DOWNLOAD_DELAY=8 -o test-2.json

You will need the data from the step above to do the step below
If you execute all cells in "notebooks/retrieve labels round 2.ipynb"
it will generate labels2-min.json

###Team

  • Nicolas Botello
  • Yang Yang
  • Austin Tang
  • Connor Flatt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published